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Abstract 

During L/U decomposition of a sparse matrix, it is possible to perform com- 
putation on many diagonal elements simultaneously. Divots that can be pro- 
cessed in parallel are related by a compatibility relation and are grouped in a 
compatible set. The collection of all maximal compatibles yields different 
maximum sized sets of pivots that can be processed in parallel. Generation of 
the maximal compatibles is based on the information obtained from an 
incompatible table. This table provides information about pairs of incompa- 
tible pivots. In this paper, generation of the maximal compatibles of pivot 
elements for a class of small sparse matrices is studied first. The algorithm 
involves a binary tree search and has a complexity exponential in the order of 
the matrix. Different strategies for selection of a set of compatible pivots 
based on the Markowitz criterion are investigated. The competing issues of 
parallelism and fill-in generation are studied and results are provided. A 
technique for obtaining an ordered compatible set directly from the ordered 
incompatible table is given. This technique generates a set of compatible 
pivots with the property of generating few fills. A new hueristic algorithm is 
then proposed that combines the idea of an ordered compatible set with a 
limited binary tree search to generate several sets of compatible pivots in 
linear time. Finally, an elimination set to reduce the matrix is selected. 
Parameters are suggested to obtain a balance between parallelism and fill-ins. 
Results of applying the proposed algorithms on several large application 
matrices are presented and analyzed. 
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Introduction 

Solution of a linear system of equations is required in many application 
programs. One such area is the VLSI circuit simulation programs. Every 
computer-aided circuit analysis program includes a routine that solves a sys- 
tem of sparse linear equations. If implicit integration is used, at every time 
step one must solve a system of nonlinear equations (usually by Newton itera- 
tion). At every iteration a system of linear equations must be solved. 
Depending on the integration method, tin* number of times that a sparse sys- 
tem of linear equations needs to be solved may be large. If it is possible to 
reduce the solution time for the sparse system, the total circuit analysis time 
would be significantly reduced. One method for solving such a system is the 
factorization of the matrix into lower and upper triangular matrices followed 
by forward and back substitutions. 

One promising area for advances in solution technique is the use of paral- 
lel computers and parallel algorithms. Our previous work on parallelizing the 
MA28 [1] sparse matrix package for the HEP [2] multiprocessor suggests that 
sufficient parallelism is not obtainable in sparse L/U decomposition without 
processing multiple pivots in parallel [3]. Parallel pivoting strategies have 
been investigated by Oalahan [l] and more recently by Wing and Haung [5], 
[0], Jess and Kces [7] and Peters |8], Although the number of operations pos- 
sible in parallel may be large in a very sparse system, exploitation of all the 
available parallelism may significantly increase the generation of fill-ins (zero 
element of the matrix becoming nonzero as a result of elimination). Since 
fill-in increases the total computation work, it is important to keep the 
number generated under control. The purpose of this work is to study sparse 
L/U decomposition on a multiprocessor by means of an algorithm which 
exploits parallel pivots and keeps fill-in low. The class of sparse systems 
guiding the study will be those arising from the simulation of VLSI circuit., 
using a program such as SPIC’E (0]. 

Wing and Haung in [5] represent the triangulation process by a directed 
graph where the vertices represent a divide or update operation (operations 
required for performing the triangulation), and the edges determine the pre- 
cedence relation of the operations to be executed. By assigning level numbers 
to the directed graph, they identify all operations on the same level to be 
done in parallel. They use a weighted combination of fill-in cost and depth of 
computation in a heuristic to determine a nearly optimal pivot sequence. 
While Wing and Haung identify all the operations that can be done in paral- 
lel. we will identify all pivots that can be processed in parallel at each step. 
An issue that has not been discussed in the literature is that in a sparse 
matrix there are usually different sets of possible pivot candidates for each 
step, and the sizes of these sets may well vary. It seems important to study 
these possibilities and the effect of parallel pivoting on application matrices. 
Algorithms identifying parallel pivot candidates are complex, so it will be of 
value to come up with such algorithms only if the amount of parallelism in 
circuit domain matrices is large enough to justify the computation required 
to identify it. 
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In this paper, we assume a shared-memory, MIMI) model for our parallel 
computation, in which the total memory address space is accessible uniformly 
to all parallel units (processes or individual processors). This computational 
model should provide synchronization mechanisms to allow multiple memory 
updates . If multiple updates are aimed at the same memory cell, the penalty 
paid is a short delay in access time. Based on this computational model, the 
first half of this paper is devoted to study the amount of parallelism that 
exists in application matrices. This is carried out by producing all possible 
sets of pivot candidates which can be processed in parallel at each step for a 
number of small matrices. Observations are then made on different strategies 
for choosing one of the sets produced at each step, and hence the generation 
of fill-ins and possible parallel pivoting steps. The complete and detailed 
analysis of this study leads us into the second half of the paper, where we 
describe a fast heuristic algorithm to produce a set of acceptable parallel 
pivot candidates for reducing the matrix at each step. Issues involved in 
balancing parallel work and fill-in generation are discussed and verified 
through simulated results. 

Parallel Pivot Candidates 

The triangulation method used here as mentioned above will be sparse 
L/l’ decomposition. For simplicity, we only consider the diagonal elements of 
the matrix as pivot candidates. Note that pivoting usually refers to unsym- 
metric permutations of the matrix for swapping an ofT-diagonal matrix ele- 
ment with a diagonal element. In this paper, we are only considering sym- 
metric permutations of the matrix. Even though we are not pivoting in the 
above sense, the terms pivot and pivoting are used throughout the paper to 
refer to the diagonal element used to reduce the matrix at a given step and a 
symmetric permutation respectively. 

In a sparse matrix, two pivots a-- and a., can be processed in parallel if 
<i { j and an. are both zero. In other words, c/uring elimination, row j is not 
involved in the elimination process taking place for pivot a.., and row i is not 
involved in the process for a... This statement can only be true if we provide 
correct synchronizations for simultaneous update during the elimination with 
parallel pivot candidates: 

1. During elimination, when processing pivots in parallel, it is pos- 

sible that an element of a nonpivot row needs to be updated by all or 
some of the parallel processes handling pivots i,j,... for the current step. 
In order for each process to obtain a completely updated value, as a 
result of a previous update, the update operation must be done asyn- 
chronously by parallel processes. On the other hand, the order in which 
parallel processes update an element is of no importance (except for 
round ofT errors). 

2. During eliminat ion, when processing pivots a. { ,a ...... in parallel, it is pos- 
sible that a fill-in is generated in position It is also possible that 

more than one process tries to generate a fill-in in the same position 
(m,n). The position (m,n) for the fill-in must be created once by one 
process only, and other processes will update its value as in 1. 
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If two pivots a- { and a-, can be processed in parallel, and if a., and a kk 
can also be processed in parallel, then a {{ ,a and a kk cannot necessarily be 
processed in parallel. The relation between parallel pivot candidates is 
reflexive and symmetric, but not transitive, and is thus a compatibility rela- 
tion. Two pivots related in this way will simply be said to be compatible in 
what follows. A consequence of the nontransitivity of the compatibility rela- 
tion is that it classifies the elements of a set into nondisjoint subsets, so that 
all members of a subset are compatible. These subsets are called compatibil- 
ity classes. Thus, in order to come up with all possible sets of pivots that can 
be processed in parallel and are of maximum size, we need to find all maximal 
compatibles. A maximal compatible is a compatible that is not included in 
any larger compatible. 

To clarify the discussion, we define a boolean matrix B for each sparse 
matrix A, such that: 

b.. - 1 iff * 0 

b-- = 0 otherwise 

'} 

where b { . and a {} . denote elements of B and A respectively. 

Several approaches for constructing the set of maximal compatibles exist, 
and they are all based on construction of an incompatible table [10]. The 
incompatible table specifies pairs of incompatible elements. Assume pivots 
are taken from the diagonal elements of the sparse matrix and are numbered 
1 through n corresponding to diagonal elements of rows 1 through n. Now we 
could represent the incompatible table as a table consisting of { n - 1 ) columns, 
where each column i has (n-i) elements. Columns of the table correspond to 
pivot elements of the matrix. Column one of the table, corresponding to 
pivot number one, is set to the bit vector resulting from oring row and 
column one of the matrix B and keeping the last (n-i) elements. The same 
process is repeated for pivot 2 (column 2 of the table), for the submatrix 
obtained from the original matrix with row and column one eliminated. For 
every column of the table that is completely constructed, the corresponding 
row/column of the matrix is eliminated. The process is repeated for all pivots 
in order. It is important to note that the incompatible table is constructed for 
a given ordering of the sparse matrix. Thus, there are n! different incompati- 
ble tables for n! possible diagonal orderings of an n by n sparse matrix. In 
what follows, we represent the incompatible table as an array of dimension n, 
say imptbl(n), with elements of the array being sets of at most n elements 
each. Each set corresponds to a column of the table. As an illustrative exam- 
ple. the incompatible table for the matrix Al of Fig. 1.1. a is given in Fig. 
1.1. b. 

The maximal compatibles are found by combining the pivot-pairs from 
the incompatible table into larger groups with compatible elements. Several 
systematic approaches for extracting the maximal compatibles have been sug- 
gested, and they all use an exhaustive search routine. The one approach that 
seems to be more suitable for programming on a digital computer is one that 
assumes initially that all pivot candidates can be grouped into one set. Then 



4 


1 

2 

3 

4 

5 

6 


1 

x 


x 


2 3 4 5 6 7 


x x 

XX XX 

X 

X X 


X 


X 


X X 


Matrix Al 
Fig. 1.1. a 



Incompatible Table 
Fig. 1.1. b 

the information from the incompatible table is used for contradictions and 
splitting the groups where necessary. This procedure involves searching a 
binary tree. Initially, it is assumed that all pivots are compatible. They are 
grouped in one set consisting of all pivot elements. This set will be at the 
root of a binary tree, level zero. Next, the set of pivots incompatible with 
pivot number one, obtained from the incompatible table, is used to split the 
set at the root into a left and a right set, constituting level one. The left set 
consists of all elements of its parent set at level zero, except those incompati- 
ble with pivot one. The right set consists of the same elements as the start- 
ing set (parent set), except pivot one itself. At the next step, the incompati- 
ble information for pivot number two, is used to break each set at level one 
into a left and right set for level 2. Furthermore, since the matrix is sparse, 
some of the sets at a given level will not split into smaller sets for some pivot- 
ing elements, but they may still consist of incompatible elements and will 
split for some later pivots. Consequently, the binary tree corresponding to 
this search will not always be a dense tree. This process is repeated until no 
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more splitting of the sets is possible. The leaf sets are then checked and 
every set included in a larger leaf set is eliminated. The. remaining sets con- 
stitute all possible maximal compatibles. Note that the length of a path from 
the root to a leaf could be at most n. 

The above process is shown for the example matrix of Fig. 1.1 in Fig 1.2. 
Initially, pivots number 1 through 7 are grouped together as the starting set. 
Column one of the incompatible table indicates that pivot 5 is incompatible 
with pivot one. Thus the starting set is split into two sets (1,2, 3,4,6, 7) and 
(2,3, 1,5, 6, 7). At the next level, these two sets are broken into four sets, each 
using the incompatibility information for pivot number two from the table. 
This process is continued until no more splits are possible. At the end, the 
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extra sets (3,4,7) and (4,6,7) which are included in the maximal sets (b) and 
(c) respectively, are eliminated. The remaining five sets are «he maximal 
compatibles. 

A high level description of the above procedure is given below: 

procedure MAXCOMP(sset,i) 

Assumptions: 

- pivot candidates are numbered from 1 to n. 

- initially sset consists of all pivots in the matrix and 

i is the first pivot. 

while i<n do 

begin 

(’split sset into left and right sets*) 
lset = sset - imptblfi] 
rset = sset - [i] 

if (lset not a compatible set) then 
maxcomp(lset,i+l) 
iff rset not a compatible set) then 
maxcomp(rset,i+l) 

end 

In the above procedure, many branches do not need to be continued to the 
completion of the search, since they are included in other subtrees. More- 
over, as will he described later, we only need to produce compatible sets of 
maximum size. Thus, there are many branches in this tree that could be 
trimmed to limit the amount of search. Even including these features, this 
algorithm has exponential complexity, and only serves to obtain information 
about sparse matrices. 

To study the issues discussed earlier, a PASCAL program was written to 
perform symbolic L/l : decomposition on a sparse matrix. Our objective was 
to study the effects of parallel pivoting so the program performs the decom- 
position to the last parallel step and does not continue if parallel pivot candi- 
dates are not available. The structure of the program is outlined below: 

program PIVOTSET 

- Head in input matrix and construct matrix structure. 

-Construct all maximal compatibles. 

-if parallel pivoting is not possible go to. stop 
-Pick a set of compatible pivots to be processed 
in parallel. 

-Permute the matrix according to the parallel pivots for this step, 
-reduce the matrix and insert the resultant fill-ins. 

-Repeat . 

-Stop. 
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Analysis Performed 

In general, in matrices arising from circuits there are many different sets 
of compatible pivots of equal maximum size. Depending on how a sot is 
chosen to reduce the matrix at each step, we obtain a different behavior in 
generation of fill-in elements, and as a result, different possibilities for con- 
tinuing parallel pivoting in the next steps. The issues of generation of fill-ins 
and parallelism in pivoting have been studied. We used different strategies 
to select a set of compatible pivots and then obtained statistical information 
from some circuit matrices generated from the SPICE circuit simulation pro- 
gram. 

The Markowitz criterion [11] is well known for minimizing the generation 
of fill-ins in sparse matrices in sequential programming. It is based on the 
fact that at step k, the maximum number of fill-ins generated by choosing a.. 
as pivot is ( r f . — 1 )( r - — 1 ) . Here r.-l is the number of nonzero elements other 
than a { . in the i-th row of the reduced matrix, and r.-l is the number of 
nonzero elements other than a { . in column j of the reduced matrix. Mar- 
kowitz selects as pivot element at step k, the element which minimizes 
( r , ~ l)(r ; .- 1). The product (r-l)(cy-l) is the Markowit z number of element 
Of - ■ In what follows, we use the Markowitz idea as a basis for the selection of 
a compatible pivot set . 

In our first analysis we compare two different strategies for choosing a set 
of compatible pivots among all maximal compatibles. In both cases we con- 
sider only the sets of maximum size. The first strategy (called Markowitz 
sum) chooses that set among all sets of maximum size in which the sum of the 
Markowitz numbers of all its elements is minimum. The problem here is that 
some of the pivots in the set chosen for reducing the matrix may generate 
fill-ins in the same positions, and thus we overestimate the Markowitz count 
for a purely sequential case. As an alternative, a second strategy is employed 
(called Ored Markowitz). Here, using the boolean matrix B corresponding to 
the sparse matrix under consideration, we count number of nonzeros in a vec- 
tor that is the result of ORing rows of pivot candidates in the set and multi- 
ply this number by the number of non-zeros in a vector resulting from ORing 
columns of the potential pivots. 

Comparison of the above strategies on our test cases shows that the first 
method is almost always superior. Our results show that, in general, by 
minimizing the Markowitz sum we always get fewer fill-ins generated and 
often more rows are reduced in parallel steps. This study has shown that the 
amount of parallelism in circuit matrices is quite high but that the generation 
of fill-in terms is also quite high in most cases when compared to the sequen- 
tial runs on the same matrices. The number of potential pivots to be pro- 
cessed in parallel at each step seems to be so high that we could process fewer 
pivots in parallel in a step without limiting the parallel work considerably. 
An experiment to study this possibility is performed by picking the maximum 
sized set with minimum Markowitz sum as was explained above. This set is 
then used to reduce the matrix, with the following analysis performed on the 
set of compatible pivots. 
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-Discard the pivot with maximum Markowitz count and determine 
number of fill-ins that would be generated as a result. 

-Repeat the above procedure until no more pivots can be discarded 
from the set, either because the set size is too small or because 
all Markowitz sums are zero. 

Although the above analysis of reducing the size of the set of compatible 
pivots was done for each step, the actual elimination and fill for a step was 
done using the maximal compatible with lowest Markowitz sum. This 
analysis is repeated at each parallel step and the results show that it is possi- 
ble to decrease the generation of fill-ins at this step significantly by reducing 
the amount of parallel work slightly. In fact, discarding only one compatible 
pivot results in a decrease of at least about one third in the number of fill-ins 
that would be generated otherwise. 

We performed this analysis over all generated sets of compatible pivots 
also. In this experiment, we chose maximum sized set with minimum Mar- 
kowitz sum and used it for reducing the matrix as described below: 

- for all sets of maximum size do 

find the pivot with maximum Markowitz count and 
remove from t he sot . 

- find the set of maximum size and minimum Markowitz sum 
and determine number of fill-ins that would be generated 
from the processing of this set. 

- repeat t he above process. 

Similar results were obtained by applying the above two procedures to our 
test matrices. Hence, we will use the first method for the next phase. That is. 
the next analysis is performed on the set of maximum size and minimum Mar- 
kowitz sum. 

Even though the above experiment shows we can always generate fewer 
fill-ins at a step by avoiding the maximum possible parallelism, it does not 
indicate that this will not delay the generation of fill-ins to later steps. In our 
next experiment, we choose the maximum sized set with minimum Markowitz 
sum. but this time we discard the pivot with maximum Markowitz count 
from the set and use the resulting set for elimination and fill generation. We 
will also repeat the previous analysis by reducing the set size and determining 
number of resulting fill-ins. This work confirms our previous result that by 
discarding some of the parallel pivot candidates according to their high Mar- 
kowitz count we decrease the total generation of fill-ins. 

Results of Complete Analyses 

A set. of circuits to be simulated by the SPICE circuit simulation pro- 
gram is available as a benchmark to test. SPICE. We used these circuits as 
input to SPICE and generated their corresponding matrices. These matrices 
are used as test cases for analysis purposes. The first circuit is a simple 
differential pair and generates a 16 by 16 matrix with 57 nonzeros. The 
matrices are of small sizes and the size range is between 12 by 12 to 24 by 24. 
The complexity of our algorit hm to generat e all possible maximal sets of com- 
patible pivots would not allow us to test, larger matrices, but the generated 
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information produces valuable statistics about parallelism and circuit 
matrices. An algorithm with tolerable complexity to produce a set of compa- 
tible pivots will involve heuristics; therefore, it will not give total informa- 
tion about the matrix. 

The results of comparison of Markowitz sum and Ored Markowitz stra- 
tegies are summarized in Table 1.1 (tables are provided in appendix A at the 
end of this paper). The first column describes the circuit, the order of the 
matrix, and number of nonzeros. The second column indicates the parallel 
pivoting step. Columns 3 to 5 correspond to the Markowitz sum strategy 
described earlier, and columns 6 through 8 correspond to Ored Markowitz. 
The first column for each algorithm is the size of the maximum set of pivots 
obtained at a step, the second column is the minimum operation count 
obtained for such a set, and the last column specifies the number of fill-ins 
that are generated as the result of processing the indicated set. Column fl 
indicates the total number of maximal compatibles generated at each step. 
The last two columns are information generated by the SPICE program 
about the amount of fill-in generated and the percentage of the matrix which 
is zero. 

As can be seen from the table, in every case the second strategy resulted 
in ecpial or more fill-ins and equal or fewer parallel steps with fewer number 
of rows reduced. This indicates that the Markowitz sum is a better heuristic 
for selecting the set of pivots among many sets. This can be observed from 
the 16 by 16 matrix of the differential pair circuit. In the first step, with sets 
of size six, Markowitz sum generated 6 fill-ins while Ored Markowitz gen- 
erated 8. The pivot set chosen by the Markowitz sum generated fewer fill-ins 
than the Ored Markowitz algorithm, and, as can be seen, the Ored Markowitz 
resulted in twice as many fill-ins as the Markowitz sum and fewer pivots were 
processed in parallel (1-1 for Ored Markowitz and 15 for Markowitz sum). 
The same behavior resulted from the ECL compatible SC’MITT trigger circuit 
which produced an 18 by 18 matrix. The number of fill-ins at step 2 of paral- 
lel triangulation is 10 for Ored Markowitz and only -1 for the Markowitz sum 
with none being generated in the next steps. Ored Markowitz generated -1 
more fill-ins at step 3 and was not able to find any more parallel pivot candi- 
dates, but the first strategy continued to do one more parallel step. Of 
course, there are cases where both strategies produced similar or close results, 
as can be seen from the table. The table also indicates that, in parallel runs, 
generation of fill-ins is much higher than in sequential runs of the SPICE pro- 
gram. At the same time it can be seen that the matrices generally do not 
become dense rapidly, and parallel pivot candidates are available to almost 
the very last steps of the triangulation process. 

The result of our next analysis is shown in Table 1.2. At each step, a set 
of maximum size and minimum Markowitz sum is selected to reduce the 
matrix. Furthermore, from this set we repeatedly remove a pivot with max- 
imum Markowitz count and compute the number of fill-ins that would be 
generated if this set were used to reduce the matrix. As can be seen from the 
table, in every case it is possible to reduce number of fill-ins significantly by 
reducing the amount of parallelism slightly. For example, for the 16 by 16 
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matrix with 57 nonzeros, we can see that if we reduce the number of compati- 
ble pivots from 6 to 1 by removing the two pivots with highest Markowitz 
count from the set, we can prevent generation of more fill-ins. Also in ihe 
last 24 by 24 matrix with 158 nonzeros, we can reduce the number of gen- 
erated fill-ins by a factor of 2 (from 40 to 20), if we discard two pivots in step 
one in the same fashion. This is a general result that can be observed from 
the table for all cases and all parallel steps. 

In the next experiment we confirm that it is possible to reduce the total 
generation of fill-ins. as opposed to just at each step , by using fewer than the 
maximum number of compatible pivots. In every case we have been able to 
reduce the total number of fill-ins by some fraction (at least about one third), 
compared to the case where maximum parallelism was employed . These 
results are summarized in Table 1.3. Here we chose to discard a pivot from 
the maximal compatible set according to its highest. Markowitz count. If the 
maximal set would not generate any fill-ins, because of a zero Markowitz sum, 
we did not discard any pivots from the set. The total number of fill-ins gen- 
erated for the first matrix (16 by 16) is 2 which is one third of the amount 
generated with our first experiment (6). This number was reduced from 10 to 
26 for the case of the 24 by 24 matrix with 154 nonzeros. In this case the 
number of parallel steps was increased from 5 to 6, but the total number of 
rows that could be reduced in these steps remained constant. In fact, in most 
cases, the number of parallel steps is increased, but the total number of 
pivots that could be processed in these steps does not change much (no 
change is greater than one addition or reduction in the number of reduced 
rows). 

Generation of Compatible Sets from the Incompatible Table 

It is clear that in large sparse circuit matrices the number of possible 
pivots to be processed at each step will be much higher than our smal' exam- 
ple matrices, and therefore, it will be possible to obtain enough parallel work 
by just considering a sub-maximal set of compatible pivots at each step. The 
algorithm described involves a complete binary tree search and has exponen- 
tial complexity in the order, n. of the sparse matrix. In order to come up 
with a good heuristic, we need to relax the requirement of finding the maxi- 
mal set of compatible pivots with minimum Markowitz sum. As a conclusion 
from the above analysis, we will have to reduce the size of the set to decrease 
the generation of fill-ins. Keeping these problems in mind, an acceptable set 
would be one which has a large number of pivot candidates for parallel pro- 
cessing and a low enough Markowitz sum. We now need to look for a pro- 
cedure which tends to produce a number of compatible sets of reasonably 
large size and low Markowitz sum. Having generated such sets, we can then 
choose the best candidate among these compatible sets using the same cri- 
teria as before. In what follows, we will describe different issues which will 
lead us to a good heuristic algorithm and a set of parameters to be used in 
trading off between fill-in generation and the size of the set of parallel pivot 
candidates. 



11 


So far, the information from the incompatible table has been used to 
construct the maximal compatible sets of pivots in a complete binary tree 
search algorithm. A more careful analysis of the incompatible table could 
provide a set of compatible pivots without the need for searching the tree. 
As we know, this table gives information about the incompatible pairs of 
pivots. In other words, by looking at column i of the table corresponding to 
pivot i, we obtain all pivot numbers j>i where pivot j is incompatible with 
pivot i, for a given ordering of the matrix. Note that we are assuming pivots 
are taken from the diagonal of the matrix and they are numbered 1 through n 
corresponding to rows 1 through n of the matrix. Consequently, if column i 
of the table is null, then the corresponding pivot number i is compatible with 
every pivot whose corresponding column 'lies to the right of column i. Hence, 
by scanning the incompatible table, we can find a set of compatible pivots 
whose corresponding columns in the table are null. Clearly, pivots with such 
a property are compatible and can be grouped in a compatible set. losing the 
representation of the incompatible table described earlier, the above pro- 
cedure can be formulated as: 

scan imptbl from right to left 
for each column i of imptbl do 
if ( imptbl '. is empty) then 

(*add the corresponding pivot to the set of compatibles*) 
compact = compset 4- [i] 

where compset is the set of compatible pivots whose corresponding columns 
in the table are null. Now if there exists a pivot k such that the set of pivots 
incompatible with it in column k of the table, is disjoint from the set of 
already constructed compatible pivots in compset . then k is compatible with 
every pivot in compset . Therefore, we can expand compset by adding k to it. 
The above procedure can now be completely described as: 

scan imptbl from right to left, 
for each column i of imptbl do 
begin 

if ( imptbl p| compset is empty) then 
(*add [i] to the set of compatibles*) 
compset = compset -f [i] 
else 

delete row i of imptbl 
end 

The compatible set, compset , produced by this procedure, will be referred to 
as an ordered compatible set from now on, since it is obtained by imposing a 
specific ordering on the diagonal elements of the matrix to get the incompati- 
ble table. As an example, the incompatible table of matrix A2 in Fig. 2.1. a is 
given in Fig. 2.1.b. The compatible set corresponding to the null columns of 
the table consists of pivots 10 and 11. This set consists of 2, 10, and 11 after 
the above expansion. 

As was explained previously, our strategy for selecting a compatible set 
among all possible compatible sets of equal maximum size was to select the 
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one with minimum Markowitz sum. That is, to select the set in which the 
sum of Markowitz numbers of the pivots in its set is minimum. If we consider 
the set of compatible pivots constructed above directly from the incompatible 
table, we see that it consists of pivots 2, 10, and 11, which in turn have Mar- 
kowitz numbers 0. 4. and 12. In general, we would like to have a compatible 
set consisting of pivots with as low Markowitz numbers as possible. It is also 
clear that pivots with low Markowitz numbers generally have fewer incompa- 
tibilities. Moreover, by looking at the incompatible table of Fig. 2 . 1 . b , we see 
the compatible pivots 10, 11 are obtained from the right end portion of the 
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table. This is usually (he ease, since as we construct columns of the incompa- 
tible table, we are left with a smaller submatrix to work with. Thus, after 
completing each column, we have fewer incompatibles left for the construc- 
tion of the next column. These observations lead us to use a different order- 
ing in which the first column of the incompatible table has the maximum 
number of incompatibles and as we work our way to the right end of the 
table, the number of incompatibles will decrease to the minimum. Such an 
ordering implies the resulting incompatible table will have more null columns 
clustered at the right end. So the ordered compatible set that can be con- 
structed from the ordered table will be of a larger size and smaller Markowitz 
sum than the results of the above procedure. As a result of these arguments, 
we sort the pivots in order of decreasing Markowitz numbers. Using this new 
ordering, we can construct a new incompatible table with the first column 
corresponding to the pivot with highest Markowitz number and the last 
column corresponding to the pivot with lowest Markowitz number. As an 
example, the Markowitz numbers and the new ordering of the pivots are 
shown in Fib. 2. 2. a for matrix A2 of Fig. 2.1. a. The corresponding ordered 
incompatible table is given in Fig. 2.2.b. It can be seen from Fig. 2.2.1) that 
the collection of pivots corresponding to null columns of the table gives a 
compatible set of size -1 and Markowitz sum 1 consisting of pivots 1, 2. 3. and 
-1. This is in comparison with set of size 2 and Markowitz sum 16 generated 
from the unordered incompatible table of Fig. 2.1.b. After expanding this 
set. we produce a compatible set of size 5 and Markowitz sum 16 consisting of 
pivots 1. 2, 3. 4. and 9. 


Limited Binary Search Tree 

In this section, we will combine the idea of an ordered compatible set 
with the tree search algorithm described earlier to obtain a limited tree 
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search algorithm which produces an acceptable set of compatible pivots for 
reducing the matrix. Given a set of all pivot elements, we can now directly 
produce a set of compatible pivots from the ordered incompatible table. This 
ordered compatible set is obtained for the initial starting set at the root of 
the binary search tree. A child set in the tree is a subset of its parent set. In 
this context, every set at any given point in the tree has fewer pivots than 
the root set. Such a set could be considered as a starting set itself. Provided 
we could produce the correct incompatible table for this set, we could gen- 
erate its corresponding ordered compatible set directly from the new table. 

The incompatible table for a given starting set, .9., is the original table 
with those rows and columns corresponding to the pivots absent from ,9. 
eliminated. If we let .9 be the initial set of all pivot candidates in the sparse 
matrix and S. be an arbitrary starting set in the tree , then the procedure to 
obtain the ordered compatible set for S { , compsct . , from an updated and 
ordered incompatible table can be represented as: 
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compset ■ = empty 
less — S—S. 

I 

for j = n down to 1 do 
begin 

if ( j e 5 ) then 
begin 

tempset = implbl. — less 
tempsel = tempset P) compset . 
if ( tempset = empty ) then 
compset ■ = compset { + [j] 
end 

end 

where less is the set of pivots absent from S.. Line 5 allows only those 
columns of the incompatible table whose corresponding pivot j is in .S'. to be 
tested for the compatibility relation. Set less is used in line 7 to eliminate 
rows corresponding to the absent pivots in S { . compset { holds the current set 
of compatible pivots. A check for a new pivot being compatible with those 
already in compset . is made in line 9. 

It is now possible to produce an ordered compatible set for any set at anv 
arbitrary point in the tree directly from the incompatible table. Given a 
starting set. our method of producing an ordered compatible set tends to gen- 
erate a large set of low Markowitz sum. Thus, we can produce a number of 
ordered compatible sets for many starting sets at different points in the tree 
and choose the best candidate among them to reduce the matrix. The follow- 
ing theorem will eliminate of some of the redundant work. 


1 . 

o 

3. 

4. 

5. 

6 . 

7. 

8 . 

9. 

10 . 
11 . 
12 . 


Theorem 

All ordered compatible sets derived from the starting sets in the binary 
search tree with level L-l or less are included in the ordered compatible sets gen- 
erated from the sets at level L of the tree. (i.e.. it is only necessary to generate 
ordered compatible sets for starting sets at level L to cover those at level l <L.) 


Proof 

Let S be the initial starting set at the root of the binary tree consisting 
of all pivots, P y ....P n . Let S 0 . S x be the left and right children of S . Let 
compset be the ordered compatible set obtained directly from the incompati- 
ble table for the set S . Similarly, let compset 0 and compset j be the ordered 
compatible sets corresponding to S Q and Sj respectively. 

A pivot P ■ can split a set S iff: 

P* S and 

{set of incompatibles with P. } Q S £ empty. 
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Assume* P- splits R int<; S Q and 5, ; then: 

S Q = S - {set of incompatibles with P ■ } and 
S^S-IP.). 

There are two cases to consider: 

i. P. not in compset 

The table corresponding to consists of the same null columns and 
compatible pivots as in compset so: 

compset = compsety 

ii. P ■ e compset 

tIioii we must have: 

implblp P) compset — empty 

since P. is compatible with all pivots in compset . In this case, compset 0 
obtained from S 0 is equal to compset . We know P. is in the set S Q and 
that the incompatible table for S Q is the same as the table for the parent 
set S with those rows and columns corresponding to incompatibles of P. 
eliminated. Thus, all the compatible information which resulted in pro- 
duction of compset is transferred from the parent set S to S Q and conse- 
quently: 

compset = compsel Q . 

The above argument proves that, at level 1, one of the sets S Q or S t will 
produce the same ordered compatible set as produced by its parent set. This 
proof holds for any two children of a set. In other words, at any point in the 
tree, an ordered compatible set corresponding to a parent set is reproduced 
by one of its children. 

Induction on level verifies that generating the ordered compatible sets 
for every set from the root through level L of the tree does not produce any 
more information than producing the ordered compatible sets for every set at 
level L only. 

As a consequence of the theorem, we generate all the sets at a given level 
in the binary tree, and for each set, we produce an ordered compatible set 
from the ordered incompatible table. Among the generated compatible sets 
we choose the set of largest size and lowest Markowitz sum to reduce the 
matrix and call the the resulting set the eliminat ion set. 

If we note that we split each set at each level of the tree for a given pivot 
according to its incompatibility information, then generation of the starting 
sets at different levels could be done in various ways: 

i. We could split the starting sets using the original pivot ordering given by 
the input sparse matrix. This would generate completely random 
results. 



17 


n. 


ni. 


T1 e same ordering used to order the incompatible table could be used to 
l besets. This left to right ordering does not seem to agree with our 
Markowitz sum requirement. At each split (level in the tree) we 
include one of the pivots say p , with highest number of incompatible* 
(highest Markowitz number) in the left subtree. This inclusion also 
means we take a large number of pivots incompatible with p. out of the 
sets in the left subtree. These pivots that are incompatible with p. have 
lower Markowitz numbers than p and could themselves be compatible 
\ V1 1 some other elements in the set. As a result, this ordering will pro- 
duce a left set considerably smaller in size than the resulting rMit set 
,i°nM VCr ’ set contains pivots of high Markowitz number which 

ould produce many fills if used to reduce the matrix. Therefore, some 

° f ‘M argG C ° m ^ * bIe se,s wi , lh small Markowitz sums cannot be gen- 
erated from one of the sets in the left subtree unless we search very deep 

the right'Tublrees * ° aSe ’ C ° m ^ Uhh spts wou,d ™ one of 

A third alternative would be to split the sets with pivots in increasing 
order of their Afarkowitz numbers. Of course, in this case, the incompa- 
tibility information of the pivots used to split a starting set is taken from 
he right end of the incompatible table. Thus the complete incompatibil- 
,t -\ ,n formation for a pivot i is obtained by concatenating the row and 
column 1 of the table. This process, seems to give a better balance to the 
bmarc tree for the first few levels used to generate the starting sets 
required in our algorithm. Furthermore, it has the property that does 

not ignore pivots of low Markowitz numbers. 3 

The high level description of this algorithm is given below: 

Program Parallel Pivoting 

calculate Markowitz numbers of pivots in the 

remaining unreduced matrix. 

- SORT pivots in decreasing order of Markowitz numbers 

- produce all starting sets at level ULEVEL taking the 
pivots to split the sets from the root to ULEVEL in 
order of increasing Markowitz numbers. 

for each set at ULEVEL produce an ordered compatible set from 
ttie updated ordered incompatible table. 

among the ordered compatible sets generated above choose the 
maximum sized set with minimum Markowitz sum (Elimination set). 

Searched 1 ^hP-iU^r 6 - leveI ni,lnbcr indicating the depth of the tree to be 
1 . The algorithm is no longer exponential in time. An efficient imple- 
mentation of the required sort and set operations are important factors P in 
efficient execution of the algorithm. The set operations used in the construc- 
tion of the incompatible table are of order 1 (adding an element to the set or 
in l e ; S n for membership). The incompatible table can therefore be constructed 
time nz, where nz is the number of nonzero elements of the matrix Gen- 
eration of an ordered compatible from the incompatible table requires scan- 
ning n sets corresponding to the columns of the table, and performing inter- 
section and difference operations on the sets. These operations are of order n 



18 


with a constant factor equal to the inverse of the number of bits per com- 
puter word. The set operations are usually implemented in machine language 
or micro code and thus have a small time factor. They could be considered to 
have a constant time (rather than order of n) compared to the time taken to 
execute a high level language statement. Production of all starting sets for 
level ULEVEL takes a constant time. Generation of an ordered compatible 
for each starting set at ULEVEL takes a constant times n as explained above. 
For reasonable values of ULEVEL, all ordered compatible sets can be derived 
in parallel for different starting sets. In the next section we will see that good 
results are obtained for small, constant values of ULEVEL compared to n. 
The complexity of the algorithm is bounded above by the sorting algorithm. 
Thus, employing an efficient parallel sort would improve the performance of 
the new algorit hm. 

Balance between Parallelism and Fill-in Generation 

Even though the above procedure tends to produce large sets of low Mar- 
kowitz sums, we still could optimize the generation of fill-ins by considering a 
subset of the elimination set. That is, there could still be some room for trad- 
ing off between parallelism and fill generation. To accomplish this task, we 
need to come up with parameters to> control the number of pivots to be pro- 
cessed in parallel and the number of fills to be generated. One such parame- 
ter could be the size of the set of compatible pivots. By allowing a percentage 
of the set to be discarded, we can control the the number of compatible 
pivots to a degree that does not limit our parallel work by too much. For 
clarity, this parameter is called the shrinkage parameter and is used as a 
lower limit to shrink the elimination set by a percentage of its size. A 
different parameter could be an upper limit on the size of the elimination set. 
This limit would allow just enough work to keep our parallel processes busy. 
Of course shrinking of the elimination set must not be done arbitrarily by 
throwing pivots out of the set. In general, we would like to shrink our set by 
discarding pivots that would cause generation of many fills. Such pivots tend 
to have high Markowitz numbers. We already have pivots ordered according 
to their Markowitz numbers. We could use this ordering to scan pivots with 
highest Markowitz number in the elimination set and test against a threshold 
value. If pivots with Markowitz numbers greater than a threshold exist and 
if our shrinkage parameter allows, they are discarded from the elimination 
set. Use of a threshold value will allow us not to shrink a set that consists of 
all good pivot candidates of reasonably low Markowitz numbers. To serve 
this purpose, the threshold value should be set in comparison with low and 
high Markowitz numbers of the pivoting elements in the matrix. Again the 
ordered Markowitz numbers of pivots can be used to set such a threshold 
value conveniently. One way is to specify a fraction of candidates to be dis- 
carded from the elimination set. Consequently, we set the threshold to the 
Markowitz number of a pivot in a specific position in the list of pivot ele- 
ments of the unreduced matrix (ordered by decreasing Markow-itz numbers). 
Any pivot above this point in the ordered list is considered to have a high 
Markowitz number and therefore is a candidate for being discarded from the 
set, and any pivot below this point is considered acceptable. 
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Pivots in the elimination set arc scanned in order of their highest Mar- 
kowitz number. If a pivot with Markowitz number greater than the thres- 
hold exists and if the set is not already of minimum size, it is discarded from 
the set. The process is repeated until either no more pivots of large Mar- 
kowitz numbers are left in the set or the set cannot be further shrunk° In the 
next section we present the result of different strategies and various parame- 
ters discussed here for a number of test matrices. 

Analysis of the Results 

The complexity of the binary tree search algorithm to obtain maximal 
compatible sets was such that it could not be run to completion for a 38 by 
38 matrix. To verify the validity of our heuristic program, we performed 
every analysis described in this section on the small test matrices of Table 
1.1. Recall that the new algorithm produces a number of starting sets for a 
given level (ULEVEL) of the binary search tree. For each starting set, an 
ordered compatible set is produced. Among the generated ordered compati- 
ble sets, the set with maximum size and minimum Markowitz sum is selected 
as the elimination set at that parallel step. Two alternative orderings for 
generation of starting sets at ULEVEL were discussed earlier. For simplicity, 
we call the algorithm to reduce a sparse matrix by compatible pivots using 
the decreasing order of Markowitz numbers for starting set splitting, 
DCOMP. Similarly, the algorithm which uses the increasing order of Mar- 
kowitz numbers is called 1COMP. 

Detailed information produced by DCOMP and ICOMP are presented for 
three sparse matrices in Table 2.1. Column one of the table gives a descrip- 
tion of the sparse matrix under consideration. Column 2, specifies the paral- 
lel step. Columns 3, 4, and 5 give the number of compatible pivots in the 
elimination set, its Markowitz sum and number of fill-ins generated at each 
step for program DCOMP. Similar information is summarized in the next 
three columns for program ICOMP. The information presented here is for 
ULEVEL =4. The first two matrices have been completely analyzed in the 
previous section and are presented here to show the validity of our proposed 
algorithms. It is interesting to see that, for the first matrix, DCOMP pro- 
duced exactly the same results as the complete tree search program. On the 
other hand, ICOMP produced different results. Even though ICOMP pro- 
duces a smaller compatible set in the first step, it finds larger sets in the next 
steps and reduces the same number of rows (i.e., 21) in five parallel steps. 
ICOMP generates 22 fills, almost half the number produced by DCOMP (40) 
or even the complete binary tree search algorithm (40). The same behavior is 
observed from the second 24 by 24 matrix. The third matrix is obtained from 
the circuit of an 8-bit full adder and is a 144 by 144 matrix with 616 
nonzeros. Note that both algorithms produced an elimination set of 72 pivots 
in the first step, and so, half of the matrix can be reduced in parallel in one 
step. In this case the advantage of ICOMP over DCOMP is not significant. 

To see how variation of depth will affect the resulting compatible sets, 
we ran both programs for values of ULEVEL between 2 and 5, for a number 
of matrices. These results are summarized in Table 2.2. Again the first 
column describes the matrix. The second column specifies ULEVEL. 
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Columns 3 to 6 give related information for the DCOMP program. Here the 
first and the third columns specify number of parallel steps taken to reduce 
the matrix and number of rows reduced in those steps, respectively. The 
second column is the average parallel work at each step and is obtained by 
dividing total number of rows reduced in parallel by the number of parallel 
steps. The fourth column gives the total number of fill-ins generated by 
parallel reduction. The next four columns of the table provide similar infor- 
mation for the IC'OMP program. The last two matrices of the table are pro- 
duced from the SPAR program, which is a structural analysis program [12]. 
These two matrices have a peculiar block structure. Our initial objective was 
to study sparse matrices arising from SPICE. These matrices ordinarily have 
a random sparsity structure, but at the’same time, the limited connectivity 
between nodes of the input circuit results in a limited number of nonzeros per 
row/column. The SPAR matrices will provide some insight into the behavior 
of our heuristic algorithms for a wider class of matrices. 

It is clear from the table that, in almost every case, ICOMP produces 
better results both in terms of number of rows reduced in parallel and 
number of fill-ins generated. As was expected, DCOMP finds elimination sets 
of lower Markowitz sums as we search deeper in the tree. This is observed 
from the first 2-1 by 21 matrix and from the last SPAR generated 505 by 505 
matrix. In the first matrix, ICOMP produced 18 fills, reducing 21 rows in 5 
parallel steps, while DCOMP generated more than twice the number of fills 
and reduced 20 rows in 5 steps. The number of fills decreases for DCOMP as 
ULEYEL is increased, while ICOMP takes the opposite direction. This also 
shows that reasonably acceptable compatible sets, both in terms of size and 
Markowitz sum. are generated for small values of ULEVEL and it is not 
necessary to search very deep in the tree. The above observations hold for 
every matrix presented in the table, except the 78 by 78 matrix produced by 
the SPAR program. This matrix does not have characteristics typical of 
SPICE generated matrices; but, as we will see in our next analysis, acceptable 
results are produced for this matrix as well. Note that there are cases for the 
DCOMP program in which a higher average parallelism is indicated in the 
table than for ICOMP. In those situations, it is often the case that fewer 
rows have been reduced by DCOMP than by IC'OMP. 

The remaining analyses are performed on the ICOMP program only, 
since it produces better results. In what follows, a value of 4 is used for 
DLL\ EL. The next step is to study the effects of varying the parameters pro- 
posed earlier to obtain a balance between generation of fill-ins and the 
amount of parallel work. Results are summarized in Figures 2.3 to 2.6 for 
four of the matrices of Table 2.2. In these graphs, four different symbols are 
used to represent four different values of the threshold parameter. Recall 
that the threshold is set to the Markowitz number of a specific pivot in the 
ordered list of pivot candidates. On the graphs, the threshold value is given 
as a fraction of the pivoting elements in the remaining unreduced matrix, 
ordered in order of decreasing Markowitz numbers. For example when the 
threshold is 1/3. the Markowitz number of the pivot residing in the 1/3 point 
of the ordered list of pivot candidates in the unreduced matrix is obtained. 
Any pivot in the elimination set with Markowitz number greater than this 
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value is a candidate to be discarded from the set. 

The graphs present information about the number of generated fill-ins 
versus the shrinkage parameter. In each case, the analysis is performed for 
threshold values of 1/10, 1/3, 1/2, and 2/3. For every value of the threshold, 
shrinkage parameter values of 0, 5, 30, and 40 percent are considered. Here 
we have limited our sets to be at most of size 25. Thus any set which consists 
of more than 25 compatible pivots is reduced to size 25 by discarding pivots 
of highest Markowitz numbers. From the information presented in the 
graphs of Fig. 2. 3-2. 6, it is apparent that we have been able to reduce the 
generation of fill-ins in every case. For the 21 by 21 matrix (Fig. 2.4), the 
number of fills is reduced by 14% over the range of threshold values con- 
sidered. This number is higher for tlfe rest of our test cases. We have 
obtained an overall reduction of 30%, 36%, and 50% in the number of fill-ins 
produced by 1COMP for the 8-bit full adder matrix and the two SPAR 
matrices, respectively. It is important to note that the number of parallel 
steps taken to reduce the matrices and the number of rows reduced in those 
steps did not change considerably with a change in the above parameters. In 
each case, the change was not more than one in the number of steps or two in 
the number of reduced rows. Thus we have been able to reduce the genera- 
tion of fill-ins considerably by giving-up an insignificant amount of the exist- 
ing parallel capability. In other words, by employing the above parameters, a 
better balance between number of compatible pivots generated at different 
steps is achieved. 

A characteristic of these application sparse matrices is that many pivots 
have equal Markowitz numbers. When pivots are ordered, those with equal 
Markowitz numbers are clustered together. Therefore, when the value of the 
shrinkage parameter is small, the threshold has no effect. This can be seen 
for different values of the threshold parameter for the shrinkage parameter 
equal to 5% in any of the graphs of Fig. 2.3 through Fig. 2.6. As this reduc- 
tion percentage is increased, the threshold parameter plays a more effective 
role. In the 8-bit full adder matrix, shrinking the set size by 5% accounts for 
most of the reduction in fill-in:., and after that, the changes are not 
significant. For the two matrices generated by the SPAR program, results 
are more evenly distributed over the changes in the above parameters (Fig- 
ures 2.5 and 2.6). Different values of these parameters do not start to affect 
(lie results until we allow a large fraction of pivots to be discarded in the 24 
by 24 matrix of Fig. 2.4. 

Recall that we have restricted the compatible sets to be at most of size 
25. For the matrices that generate considerably larger sets, the threshold 
parameter does not play an important role because an upper limit is imposed 
on the size of the elimination set. To observe the effects of the threshold 
parameter, the upper limit is ignored. Results for the same values of the 
threshold and shrinkage parameter for the 8-bit full adder and the 505 by 505 
SPAR matrix are presented in Fig. 2.7 and Fig. 2.8. The other two matrices 
do not generate as large sets and, therefore, are not affected by the removal of 
the upper limit parameter. A threshold value of 1/10 does not cause any 
reduction in fill-ins in either case. This suggests that the generated 
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compatible sets consist of pivoting elements of lower Markowitz numbers 
than the highest one tenth. These graphs show more variation with the 
threshold parameter, even though a higher number of fill-ins is produced. 
Without exception, the fewest fill-ins are produced for the largest values of 
the threshold and the shrinkage parameter. 

The number of fill-ins produced by IGOMP compares reasonably with 
sequential runs on the same matrices. For example, the sequential run on the 
8-bit full adder matrix produced 1C6 fills, and IGOMP produced 196, which is 
an increase of about 18%. In general, as expected, the number of fill-ins pro- 
duced by IGOMP is higher than the sequential results, but the difference is 
not great. 

Conclusion 

Solution of sparse systems of equations is essential in many application 
programs. Often such a system has to be solved repeatedly. In this paper we 
verified that in sparse matrices arising from electronic circuits it is possible to 
do computations on many diagonal elements simultaneously. A complete 
analysis of some test matrices, done by generating all maximal compatible 
sets of pivot elements, indicated the existence of many compatible pivots in 
these matrices. We have shown our test matrices do not become full during 
the decomposition. Furthermore, it was shown that many parallel computa- 
tion steps are possible, and during these steps, the matrix is often reduced 
completely. The competing issues of parallel pivoting and fill-in generation 
have been studied, and we verified through examples that it is possible to 
reduce the production of fill-ins by removing some of the parallel pivot candi- 
dates from the elimination set on the basis of high Markowitz numbers. A 
heuristic algorithm was then proposed to produce large compatible sets of low 
Markowitz sums by a combination of an ordered partial tree search strategy 
and generation of ordered compatible sets. Different orderings to produce the 
ordered compatible sets were suggested, and their advantages and disadvan- 
tages were discussed and verified through the simulated results. A number of 
parameters to provide a balance between generation of fill-ins and the 
amount of parallel work were suggested, and their effects were determined in 
the simulated results. 

The incompatible table required by the algorithm can be constructed in 
time nz (number of nonzero elements of the matrix). Production of starting 
sets for a given ULEYEL takes a constant time. For ULEVEL small and con- 
stant compared to n, generation of ordered compatibles from starting sets is 
of order n set intersection and difference operations. Assuming efficient 
implementation of the set operations is available, the heuristic algorithm has 
a complexity bounded above by the sorting algorithm required in the pro- 
gram. Thus, employing an efficient parallel sort program would improve the 
total performance of the new algorithm. Nevertheless, our results show that 
many compatible pivots are produced for parallel reduction of the sparse 
matrices, and the process can be repeated until the matrix is almost com- 
pletely reduced. In cases where the matrix is not completely reduced, the 
remaining submatrix is of such a small size that parallel operations have little 
effect. Significant reduction in generation of fill-ins is obtained by varying 
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the proposed parameters. Moreover, as the result of these parameters, a 
better balance between the number of compatible pivots generated at 
different steps was achieved, while the reduction in parallel work proved to 
be insignificant. 
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Parallel Pivoting Strategies 
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Fill-in Statistics 
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Fill-in Statistics 
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PARALLELISM 

vs. FILL-IN 
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PARALLELISM vs. FILL-IN 
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